Transcript mapping for handwritten Chinese documents by integrating character recognition model and geometric context

نویسندگان

  • Fei Yin
  • Qiu-Feng Wang
  • Cheng-Lin Liu
چکیده

Creating document image datasets with ground-truths of regions, text lines and characters is a prerequisite for document analysis research. However, ground-truthing large datasets is not only laborious and time consuming but also prone to errors due to the difficulty of character segmentation and the large variability of character shape, size and position. This paper describes an effective recognition-based annotation approach for ground-truthing handwritten Chinese documents. Under the Bayesian framework, the alignment of text line images with text transcript, which is the crucial step of annotation, is formulated as an optimization problem by incorporating geometric context of characters and character recognition model. We evaluated the alignment performance on a Chinese handwriting database CASIA-HWDB, which contains nearly four million character samples of 7356 classes and 5091 pages of unconstrained handwritten texts. The experimental results demonstrate the superiority of recognition-based text line alignment and the benefit of integrating geometric context. On a test set of 1015 handwritten pages (10,449 text lines), the proposed approach achieved character level alignment accuracy 92.32% when involving under-segmentation errors and 99.04% when excluding undersegmentation errors. The tool based on the proposed approach has been practically used for labeling handwritten Chinese documents. & 2013 Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Neural Network Based Recognition System Integrating Feature Extraction and Classification for English Handwritten

Handwriting recognition has been one of the active and challenging research areas in the field of image processing and pattern recognition. It has numerous applications that includes, reading aid for blind, bank cheques and conversion of any hand written document into structural text form. Neural Network (NN) with its inherent learning ability offers promising solutions for handwritten characte...

متن کامل

Keyword spotting in unconstrained handwritten Chinese documents using contextual word model

a r t i c l e i n f o Keywords: Keyword spotting Chinese handwritten documents Word similarity Contextual word model This paper proposes a method for keyword spotting in off-line Chinese handwritten documents using a contextual word model, which measures the similarity between the query word and every candidate word in the document by combining a character classifier and the geometric context a...

متن کامل

Similar Handwritten Chinese Character Discrimination by Weakly Supervised Learning

Traditional approaches for handwritten Chinese character recognition suffer in classifying similar characters. In this paper, we propose to discriminate similar handwritten Chinese characters by using weakly supervised learning. Our approach learns a discriminative SVM for each similar pair which simultaneously localizes the discriminative region of similar character and makes the classificatio...

متن کامل

A Model of On-line Handwritten Japanese Text Recognition Free from Line Direction and Writing Format Constraints

This paper presents a model and its effect for on-line handwritten Japanese text recognition free from line-direction constraint and writing format constraint such as character writing boxes or ruled lines. The model evaluates the likelihood composed of character segmentation, character recognition, character pattern structure and context. The likelihood of character pattern structure considers...

متن کامل

Learning Spatial-Semantic Context with Fully Convolutional Recurrent Network for Online Handwritten Chinese Text Recognition

Online handwritten Chinese text recognition (OHCTR) is a challenging problem as it involves a large-scale character set, ambiguous segmentation, and variable-length input sequences. In this paper, we exploit the outstanding capability of path signature to translate online pen-tip trajectories into informative signature feature maps, successfully capturing the analytic and geometric properties o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition

دوره 46  شماره 

صفحات  -

تاریخ انتشار 2013